True - Yes
False - No
Based on the combination - District+Janpad+Gram Panchayat+Village+Name of head of household+GOI ID
"""Data Error Condition 1: Number of Duplicate Records identified using Duplicate Key above"""
raw_data_dup = raw_data[raw_data["is_duplicate_record"] == True]
raw_data_dup.shape[0]
Deduplicating the data
Number of unique Records -
a. All the values in Column_1 are missing. b. There is no other missing value error.
Major reasons for no usage of toilet is the superstructure defect
raw_data.pivot_table(index=['District','Janpad','Gram Panchayat'],
values=['Total Household',
'No Toilet Seat',
'Water Seal Not Connected to Toilet Seat',
'No/Broken Platform',
'No Water',
'Used as Animal Shelter',
'No Wall',
'No Door',
'No Chamber',
'Pit Full',
'No/Broken Pit',
'No Cap',
'Used as Storage',
'Used for other purpose',
'No Reason'],
aggfunc={'Total Household':np.sum,
'No Toilet Seat':np.mean,
'Water Seal Not Connected to Toilet Seat':np.mean,
'No/Broken Platform':np.mean,
'No Water':np.mean,
'Used as Animal Shelter':np.mean,
'No Wall':np.mean,
'No Door':np.mean,
'No Chamber':np.mean,
'Pit Full':np.mean,
'No/Broken Pit':np.mean,
'No Cap':np.mean,
'Used as Storage':np.mean,
'Used for other purpose':np.mean,
'No Reason':np.mean},
margins=True).apply(lambda x: round(x*100,2))
Number of duplicate records region wise
dup_hh=raw_data_dup.pivot_table(index=["District","Janpad","Gram Panchayat"],
values='Total Household',
aggfunc=np.sum).sort_values('Total Household', ascending=False)
create_download_link(dup_hh,filename="Toilet Not Usable - Duplicate Records.csv")
Household Toilet Status
hh_tnu_tlt_status = raw_data.pivot_table(index=['District','Janpad','Gram Panchayat'],
values=['Total Household',
'% Headed by Women',
'% Headed by Person with Disability',
'% Having No Toilet',
'% Having No Water Source',
'% Households having Toilet but no Water Source'],
aggfunc={'Total Household':np.sum,
'% Headed by Women':np.mean,
'% Headed by Person with Disability':np.mean,
'% Having No Toilet':np.mean,
'% Having No Water Source':np.mean,
'% Households having Toilet but no Water Source':np.mean},
margins=True)
create_download_link(hh_tnu_tlt_status,filename="Toilet Not Usable - Household Toilet Status.csv")
Households by Socio Economic Condition
hh_by_sco_eco=raw_data.pivot_table(index=["District","Janpad", "Gram Panchayat"],
columns=["Economic status", "Sub-category"],
values=['Total Household'],
aggfunc={'Total Household':np.sum},
margins=True)
create_download_link(hh_by_sco_eco,filename="Toilet Not Usable - Households by Socio Economic Condition.csv")
List of Households Not Having Toilet
hh_no_tlt=raw_data[raw_data['Has Toilet']!="हाँ"].pivot_table(index=["District",
"Janpad",
"Gram Panchayat",
"Village"],
values='Total Household',
aggfunc=np.sum).sort_values('Total Household', ascending=False)
create_download_link(hh_no_tlt,filename="Toilet Not Usable - Households not having toilet.csv")
Number of records where toilet is available in the household but type of toilet has either not been provided or mentioned as not available
raw_data_tlt_nttype.pivot_table(index=["District","Janpad","Gram Panchayat"],
values='Total Household',
aggfunc=np.sum).sort_values('Total Household', ascending=False)
create_download_link(raw_data_tlt_nttype, filename="Toilet Not Usable Error - Type of toilet not provided.csv")
Number of records where toilet is not available in the household but type of toilet has been mentioned
raw_data_ntlt_ttype_pv=raw_data_ntlt_ttype.pivot_table(index=["District","Janpad","Gram Panchayat"],
values='Total Household',
aggfunc=np.sum).sort_values('Total Household', ascending=False)
create_download_link(raw_data_ntlt_ttype_pv,filename="Toilet Not Usable Error - No Toilet but Type Mentioned.csv")
Number of records where toilet is not available in the household but reason for not using it is mentioned
raw_data_ntlt_nrsn_pv=raw_data_ntlt_nrsn.pivot_table(index=["District","Janpad","Gram Panchayat"],
values='Total Household',
aggfunc=np.sum).sort_values('Total Household', ascending=False)
create_download_link(raw_data_ntlt_nrsn_pv,filename="Toilet Not Usable Error - Toilet not used but no reason.csv")